Skip to main content
Glama

Gemini OCR MCP Server

This project provides a simple yet powerful OCR (Optical Character Recognition) service through a FastMCP server, leveraging the capabilities of the Google Gemini API. It allows you to extract text from images either by providing a file path or a base64 encoded string.

Objective

Extract the text from the following image:

CAPTCHA

and convert it to plain text, e.g., fbVk

Features

  • File-based OCR: Extract text directly from an image file on your local system.

  • Base64 OCR: Extract text from a base64 encoded image string.

  • Easy to Use: Exposes OCR functionality as simple tools in an MCP server.

  • Powered by Gemini: Utilizes Google's advanced Gemini models for high-accuracy text recognition.

Prerequisites

  • Python 3.8 or higher

  • A Google Gemini API Key. You can obtain one from Google AI Studio.

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/WindoC/gemini-ocr-mcp cd gemini-ocr-mcp
  2. Create and activate a virtual environment:

    # Install uv standalone if needed ## On macOS and Linux. curl -LsSf https://astral.sh/uv/install.sh | sh ## On Windows. powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
  3. Install the required dependencies:

    uv sync

MCP Configuration Example

If you are running this as a server for a parent MCP application, you can configure it in your main MCP config.json.

Windows Example:

{ "mcpServers": { "gemini-ocr-mcp": { "command": "uv", "args": [ "--directory", "x:\\path\\to\\your\\project\\gemini-ocr-mcp", "run", "gemini-ocr-mcp.py" ], "env": { "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20", "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY" } } } }

Linux/macOS Example:

{ "mcpServers": { "gemini-ocr-mcp": { "command": "uv", "args": [ "--directory", "/path/to/your/project/gemini-ocr-mcp", "run", "gemini-ocr-mcp.py" ], "env": { "GEMINI_MODEL": "gemini-2.5-flash-preview-05-20", "GEMINI_API_KEY": "YOUR_GEMINI_API_KEY" } } } }

Note: Remember to replace the placeholder paths with the absolute path to your project directory.

Tools Provided

ocr_image_file

Performs OCR on a local image file.

  • Parameter: image_file (string): The absolute or relative path to the image file.

  • Returns: (string) The extracted text from the image.

ocr_image_base64

Performs OCR on a base64 encoded image.

  • Parameter: base64_image (string): The base64 encoded string of the image.

  • Returns: (string) The extracted text from the image.

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/WindoC/gemini-ocr-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server